Consider the following experiment that proceeds in a sequence of rounds. For the first round, we have $n$ balls, which are thrown independently and uniformly at random into $n$ bins. After round $i$, for $i \ge 1$, we discard every ball that ended up in a bin by itself in round $i$. The remaining balls are retained for round $i + 1$, in which they are again thrown independently and uniformly at random into the $n$ bins.
\begin{itemize}
\item[(a)] If in some round there are $\epsilon n$ balls, how many balls would you expect to
have in the next round?
\item[(b)] Assuming that everything proceeded according to expectation, prove that
we would discard all the balls within $O(log log n)$ rounds.
\item[(c)] Prove that with probability $1 - o(1)$, the number of rounds is $O(log log n)$.
\end{itemize}
Hint: call a round good if the number of balls retained is not much more than expected. Compute the probability that a round is good. Show that with probability
$1 - o(1)$, we get enough good rounds among the first $O(log log n)$ to finish.
\\\\
\textbf{(a)} We define random variables $X_1, X_2, \ldots , X_k$, so that
\[
X_i = \left\{
\begin{array}{ll}
1 & \text{$i^{th}$ ball lands in a bin by itself}\\
0 & \text{otherwise}
\end{array}
\right.
\]
We have:
\[
Pr[X_i = 1] = \left(1 - \dfrac{1}{n}\right)^{k-1}
\]
since it equals with the probability that all other balls do not land in this specific bin. Additionally, the total number of balls that end up in a bin alone is $\sum _{\forall k} X_i$ and hence, we have to calculate the expected value of the sum:
\[
E[\sum _{\forall k} X_i] = \sum _{\forall k}E[X_i] = kE[X_1] = k\left(1 - \dfrac{1}{n}\right)^{k-1}
\]
Therefore, with $k = \epsilon n$, we get the following number of bin that contain a single ball:
\[
n\epsilon - E[\sum _{\forall k} X_i] = n\epsilon\left(1 - \dfrac{1}{n}\right)^{n\epsilon-1} \ge \dfrac{n\epsilon}{e^\epsilon}
\]
Thus, the expected number of ball to be kept is less/equal with:
\[
n\epsilon - \dfrac{n\epsilon}{e^\epsilon} = n\epsilon\left(1 - \dfrac{1}{e^\epsilon}\right).
\]
\\
\textbf{(b)} Let $n\epsilon_i$ be the number of balls after $i$ rounds. $\epsilon_i$ satisfies the following recurrence:
\[
\epsilon_{i+1} = \epsilon_i(1 - 1/\epsilon_i) \le \epsilon_i(1 - (1 - \epsilon_i)) = \epsilon_i^2 \Rightarrow log\epsilon_{i+1} \le 2log\epsilon_i
\]
We know that $\epsilon_0 = 1$. Question (a) says that the expected value for round 1 is $\epsilon_1 = 1 - 1/e$. Unrolling the previous result we get:
\begin{equation}
log\epsilon_{i+1} \le 2log\epsilon_i \Rightarrow log\epsilon_{i+1} \le 2^{k-1}log(1 - 1/e)
\label{eq:1}
\end{equation}

In order to reach a round were there are no elements left, we need to find the smallest $i$ s.t. $n\epsilon_i < 1$, or $log\epsilon_i \le -logn$. Equation \ref{eq:1} gives us $2^{i-1}log(1-1/e) \le -logn \Rightarrow 2^{i-1}c \ge logn$, with $(c > 0)$ and taking logs again gives us $ i = \Theta(log log n)$.
\\\\
\textbf{(c)} According to the expectation, less than $1 - 1/e$ balls remain after the first round. Using the Markov inequality we get:
\[
Pr[\text{remaining fraction} > c\mu] < \dfrac{1}{c}\text{, where $c > 1$}
\]
so after $O(loglogn)$ rounds:
\[
Pr[\text{remaining fraction} > c'(1 - 1/e)] < \left(\dfrac{1}{c}\right)^{O(loglogn)} \text{, for some $c'$}
\]
Regarding, $1-o(1)$, we reach an $\epsilon = c'(1 - 1/e)$ after $O(loglogn)$ rounds.

Even if we do not manage to go to $\epsilon^2$ from $\epsilon$, but to $t\epsilon^2$ for constant $t > 1$, by setting and solving the recurrence (as in question (b)), we get that $O(loglogn)$ rounds are required to complete if everything goes according to the expectation.

Following the hint, we define a good round as one in which the remaining fraction is less than $t$ times the expected one. Using the Markov inequality, we get the following bound for a bad round:
\[
Pr[\text{remaining fraction} > t\mu] < \dfrac{1}{t} \Rightarrow Pr[\text{good round}] > 1 - \dfrac{1}{t}
\]

Therefore, less than $\frac{1}{1 - 1/t}O(loglogn)$ rounds are required to discard all the balls.

The rounds are independent, therefore we can use the good/bad characterization of rounds as an indicator variable and use Chernoff to bound the required number of rounds:
\[
Pr[\text{number of rounds} > bloglogn] < \dfrac{1}{2^{Bloglogn}} < \dfrac{1}{(logn)^B} \text{, with $b,B$ constants}
\]
Hence, regarding $1-o(1)$ we need $O(loglogn)$ more rounds starting from $\epsilon = c'(1 - 1/e)$ to discard all balls. Consequently, the whole process takes $O(loglogn)$ (we know that $(1 - o(1))(1 - o(1)) = 1 - o(1)$.